On enhancing katz-smoothing based back-off language model
نویسندگان
چکیده
Though the statistical language modeling plays an important role in speech recognition, there are still many problems that are difficult to be solved such as the sparseness of training data. Generally, two kinds of smoothing approaches, namely the back-off model and the interpolated model, have been proposed to solve the problem of the impreciseness of language models caused by the sparseness of training data. By expanding the idea of back-off to the re-estimation of not only the unseen word pairs but also all word pairs, a back-off model based modified method is proposed, referred to as the Enhanced Katz smoothing with deleted interpolation (EKSWDI). A uniform expression and two simplified versions for this modified model are also given. Experiments on a Chinese pinyin-to-character conversion system and the perplexity measure show that the proposed model has a better performance than the Katz smoothing method does.
منابع مشابه
Improved Katz Smoothi Modeling in Speech
In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the ...
متن کاملImproved katz smoothing for language modeling in speech recogniton
In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the ...
متن کاملBack-off smoothing evaluation over syntactic language models
1 Continuous Speech Recognition systems require a Language Model (LM) to represent the syntactic constraints of the language. In LMs development a smoothing technique needs to be applied to also consider events not represented in the training corpus. In this work, several back-off smoothing approaches have been compared: classical discounting-distribution schema including Witten-Bell, Absolute ...
متن کاملMorpheme Based Language Model for Tamil Speech Recognition System
This paper describes the design of a morpheme based language model for Tamil language. It aims to alleviate the main problems encountered in processing the Tamil language, like enormous vocabulary growth caused by large number of different forms derived for one word. The size of the vocabulary is reduced by decomposing the words into stems and endings and storing these sub word units (morphemes...
متن کاملLess is More: Significance-Based N-gram Selection for Smaller, Better Language Models
The recent availability of large corpora for training N-gram language models has shown the utility of models of higher order than just trigrams. In this paper, we investigate methods to control the increase in model size resulting from applying standard methods at higher orders. We introduce significance-based N-gram selection, which not only reduces model size, but also improves perplexity for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000